X-SRQ- Improving Scalability and Performance of Multi-core InfiniBand Clusters

نویسندگان

  • Galen M. Shipman
  • Stephen W. Poole
  • Pavel Shamis
  • Ishai Rabinovitz
چکیده

To improve the scalability of InfiniBand on large scale clusters Open MPI introduced a protocol known as B-SRQ [2]. This protocol was shown to provide much better memory utilization of send and receive buffers for a wide variety of benchmarks and real-world applications. Unfortunately B-SRQ increases the number of connections between communicating peers. While addressing one scalability problem of InfiniBand the protocol introduced another. To alleviate the connection scalability problem of the B-SRQ protocol a small enhancement to the reliable connection transport was requested which would allow multiple shared receive queues to be attached to a single reliable connection. This modified reliable connection transport is now known as the extended reliable connection transport. X-SRQ is a new transport protocol in Open MPI based on B-SRQ which takes advantage of this improvement in connection scalability. This paper introduces the X-SRQ protocol and details the significantly improved scalability of the protocol over B-SRQ and its reduction of the memory footprint of connection state by as much as 2 orders of magnitude on large scale multi-core systems. In addition to improving scalability, performance of latency-sensitive collective operations are improved by up to 38% while significantly decreasing the variability of results. A detailed analysis of the improved memory scalability as well as the improved performance are discussed.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-connection and Multi-core Aware All-gather on Infiniband Clusters

MPI_Allgather is a collective communication operation that is intensively used in many scientific applications. Due to high data exchange volume in MPI_Allgather, efficient and scalable implementation of this operation is critical to the performance of scientific applications running on emerging multi-core clusters. Mellanox ConnectX is a modern InfiniBand host channel adapter that is able to s...

متن کامل

Hybrid-Parallel Sparse Matrix–Vector Multiplication and Iterative Linear Solvers with the communication library GPI

We present a library of Krylov subspace iterative solvers built over the PGAS-type communication layer GPI. The hybrid pattern is here the appropriate choice to reveal the hierarchical parallelism of clusters with multiand manycore nodes. Our approach includes asynchronous communication and differs in many aspects from the classical one. We first present the GPI-based implementation of the spar...

متن کامل

Performance comparison between a massive SMP machine and clusters

In this brief report we compared the performance of selected scientific applications on sun-test a massive SMP 32-core machine with blade and zebra, the two main partitions of the production cluster at SISSA. In tables 1 and 2 some details on the hardware are presented. Information on the interconnection network is available in table 3 along with some network performance results, obtained with ...

متن کامل

A flexible Patch-based lattice Boltzmann parallelization approach for heterogeneous GPU-CPU clusters

Sustaining a large fraction of single GPU performance in parallel computations is considered to be the major problem of GPU-based clusters. In this article, this topic is addressed in the context of a lattice Boltzmann flow solver that is integrated in the WaLBerla software framework. We propose a multi-GPU implementation using a block-structured MPI parallelization, suitable for load balancing...

متن کامل

HOMME and POPperf High Performance Applications: Optimizations for Scale

The High Order Method Modeling Environment (HOMME) and the modified version of The Parallel Ocean Program (POPperf) are two important applications for atmospheric and weather research. With an emphasis on efficiency, portability, maintainability and most importantly, scalability, HOMME and POPperf have been successfully deployed over the years on a wide variety of highperformance systems, such ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008